Fuzzy Clustering of Categorical Attributes and its Use in Analyzing Cultural Data

نویسندگان

  • George E. Tsekouras
  • Dimitris Papageorgiou
  • Sotiris B. Kotsiantis
  • Christos Kalloniatis
  • Panayiotis E. Pintelas
چکیده

We develop a three-step fuzzy logic-based algorithm for clustering categorical attributes, and we apply it to analyze cultural data. In the first step the algorithm employs an entropy-based clustering scheme, which initializes the cluster centers. In the second step we apply the fuzzy c-modes algorithm to obtain a fuzzy partition of the data set, and the third step introduces a novel cluster validity index, which decides the final number of clusters. Keywords—Categorical data, cultural data, fuzzy logic clustering, fuzzy c-modes, cluster validity index.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

A fuzzy SV-k-modes algorithm for clustering categorical data with set-valued attributes

In this paper, we propose a fuzzy SVk -modes algorithm that uses the fuzzy k -modes clustering process to cluster categorical data with set-valued attributes. In the proposed algorithm, we use Jaccard coefficient to measure the dissimilarity between two objects and represent the center of a cluster with set-valued modes. A heuristic update way of cluster prototype is developed for the fuzzy par...

متن کامل

Numerical and Categorical Attributes Data Clustering Using K- Modes and Fuzzy K-Modes

Most of the existing clustering approaches are applicable to purely numerical or categorical data only, but not the both. In general, it is a nontrivial task to perform clustering on mixed data composed of numerical and categorical attributes because there exists an awkward gap between the similarity metrics for categorical and numerical data. This paper therefore presents a general clustering ...

متن کامل

New Approach for Customer Clustering by Integrating the LRFM Model and Fuzzy Inference System

This study aimed at providing a systematic method to analyze the characteristics of customers’ purchasing behavior in order to improve the performance of customer relationship management system. For this purpose, the improved model of LRFM (including Length, Recency, Frequency, and Monetary indices) was utilized which is now a more common model than the basic RFM model apt for analyzing the cus...

متن کامل

Improved Crisp and Fuzzy Clustering Techniques for Categorical Data

Clustering is a widely used technique in data mining application for discovering patterns in underlying data. Most traditional clustering algorithms are limited in handling datasets that contain categorical attributes. However, datasets with categorical types of attributes are common in real life data mining problem. For these data sets, no inherent distance measure, like the Euclidean distance...

متن کامل

A fuzzy c-means-type algorithm for clustering of data with mixed numeric and categorical attributes employing a probabilistic dissimilarity functional

Gath–Geva (GG) algorithm is one of the most popular methodologies for fuzzy c-means (FCM)-type clustering of data comprising numeric attributes; it is based on the assumption of data deriving from clusters of Gaussian form, a much more flexible construction compared to the spherical clusters assumption of the original FCM. In this paper, we introduce an extension of the GG algorithm to allow fo...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2004